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Abstract 

The capacity of a classical-quantum channel (or in other words the classi- 
cal capacity of a quantum channel) is considered in the most general setting, 
where no structural assumptions such as the stationary memoryless property 
are made on a channel. A capacity formula as well as a characterization of 
the strong converse property is given just in parallel with the corresponding 
classical results of Verdu-Han which are based on the so-called information- 
spectrum method. The general results are applied to the stationary memory- 
less case with or without cost constraint on inputs, whereby a deep relation 
between the channel coding theory and the hypothesis testing for two quantum 
states is elucidated. 
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1 Introduction 

The channel coding theorem for a stationary and memoryless 1 (classical-) quantum 
channel has been established by combining the direct part shown by Holevo pQ 
and Schumacher- Westmoreland [2] with the (weak) converse part which goes back to 
1970's works by Holevo [3J |I] . This theorem is undoubtedly a landmark in the history 
of quantum information theory. At the same time, however, we should not forget 
that stationary memoryless channels are not the only class of quantum channels. 
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It is indeed natural to think that many channels appearing in nature are neither 
stationary nor memoryless even in the approximate sense. 

In the classical information theory, a capacity formula for the most general setting 
was given by Verdii and Han |5 j , based on the so-called information-spectrum method 
jB]. We show in this paper that a similar approach is applicable to yield some 
general formulas for the capacity of a classical-quantum channel (or in other words 
the classical-capacity of a quantum channel) and related notions. 

Let us take a brief look at the general feature of the information-spectrum method 
in the classical information theory. One of the main subjects of the information the- 
ory is to characterize asymptotic optimalities of various types of coding problems by 
entropy-like information quantities. In the information-spectrum method, a coding 
problem is treated in the most general setting, without assuming any structural as- 
sumptions such as the stationary memoryless property, and the asymptotic optimal- 
ity is characterized by a limiting expression on information spectra (i.e., asymptotic 
behaviors of logarithmic likelihoods). Since the asymptotic optimization of coding is 
essentially solved in this characterization, rewriting the information-spectrum quan- 
tity to an entropy-like quantity for a specific situation is mostly a direct consequence 
of a limiting theorem in the probability theory such as the law of large numbers, the 
Shannon-McMillan-Breiman theorem, ergodic theorems, large deviation theorems, 
etc. Such a framework brings not only generality but also transparency of mathe- 
matical arguments. Indeed, we are often led to simplification of a proof of an existing 
coding theorem by investigating it from the information-spectrum viewpoint. 

Turning to the quantum information theory, in spite of the recent remarkable 
progress of the field we often see that mathematical arguments to prove theorems 
are neither so transparent nor unified as in the classical theory. For instance, the 
original proof of the direct part of quantum channel coding theorem |2] is rather 
complicated so that it is not easy to grasp the essence of the argument (; see [7j for a 
different proof). Extending the information-spectrum method to the quantum case is 
an attractive subject which brings a hope that proofs will be simplified and, more im- 
portantly, that both the optimality of coding systems and the limiting law governing 
quantum stochastic situations will be provided with transparent and comprehensive 
understanding. 

In this paper, we pursue this subject for the quantum channel coding problem, 
whereby the quantum analogue of Verudu-Han's general formula is obtained. In 
addition, the formula is applied to the stationary memoryless case to yield a new 
proof of the quantum channel coding theorem. It should be noted here that, in both 
of derivation of the general formula and application to the stationary memoryless 
case to get a nonasymptotic expression, there arise several mathematical difficul- 
ties to which the corresponding classical arguments are not immediately applicable. 
The difficulties in deriving the general formula are overcome by using the quantum 
Neyman- Pearson lemma [El HE EI] and a novel operator inequality (Lemma |2J), while 
those in rewriting the formula to the known form in the stationary memoryless case 
are coped with by invoking the asymptotic theory of hypothesis testing for two quan- 
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turn states [TTJ [T2J CHI see the references of [TU] for related results) as a kind of 
substitute of the weak law of large numbers. In particular, the inequality of Lemma |2 
is expected to play a key role in analyzing a measurement of the square root type in 
general; actually it drastically simplifies the original proof of PQ E| as mentioned in 
Remark 

Historically, the present work is preceded by Ogawa's proof ^3] of the direct part 
of the quantum channel coding theorem, with an improved and simplified version 
being found in , which was actually the first remarkable result of the information- 
spectrum approach to the quantum channel coding problem and elucidated the close 
relation between the channel coding and the hypothesis testing in the quantum in- 
formation theory; see Remark El and Remark I n the present paper, we clarify 
this relation from a more general viewpoint and make further developments to estab- 
lish the information-spectrum method in the quantum channel coding theory These 
attempts lead us to better understanding of the reason why the quantum relative 
entropy plays important roles in both of these problems. 

We should emphasize, however, that the present paper is not the final goal for 
the information-spectrum study of quantum channel capacity. Even though a general 
capacity formula has been given in terms of the quantum information spectrum, the 
way to apply it to the stationary memoryless case shown in this paper is not so 
straightforward as the classical counterpart. Indeed, if our concern is restricted to 
proving the coding theorem for stationary memoryless channels, the information 
spectrum appears to be a kind of roundabout at present; see Remarks [T3J [T7| and 
ITTn In order to achieve the same level of simplicity and transparency as the classical 
information-spectrum method and to fulfill further the above-mentioned hope for the 
quantum information-spectrum method, we will need to have more theoretical tools 
to analyze the quantum information spectrum. 

The paper is organized as follows. In section El the notion of general classical- 
quantum channels is introduced and the coding problem for it is formulated. Sec- 
tion El is devoted to asserting the main theorem, which gives the general capacity 
formula and the characterization of strong converse property of a general channel, 
while the proof is given in section E] based on some lemmas prepared in section 
Stationary memoryless channels are treated in section El and section El the latter of 
which considers cost constraint on inputs, while section [7| is devoted to revisiting 
the decoder introduced by Holevo-Schumacher- Westmoreland in view of comparison 
to our decoder used to prove the general formulas. Section El gives some concluding 
remarks. 

2 Capacity of general classical-quantum channels 

A quantum communication channel is generally composed of the following constructs; 
(separable) Hilbert spaces TCi and 7i 2 which respectively represent the quantum 
systems of the sender's and the receiver's sides, a trace preserving CP (completely 
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positive) map T from the trace-class operators on TCi to those on Ti.2 which describes 
the change of sent states, and a map V : X — > S(TCi) which represents the modulator 
to set the input state to V x according to the value of the control variable x G X. 
When our concern is restricted to sending classical messages via the channel, however, 
only the composite map r o V : X — > Si^H^) is relevant, and hence in the sequel we 
call a map W : X 3 x i— > W x G S(7i) a classical-quantum channel or simply 
a channel. Here X is an arbitrary (finite or infinite) set and 7i is an arbitrary 
Hilbert space. This definition corresponds to the classical one in which a channel is 
represented by a conditional probability W : (x,y) i— > VF(?/ 1 x) or equivalently by a 
map W : x ^W x = W( ■ \ x). 

Remark 1 In many papers treating the capacity of quantum memoryless channels 
(e.g., [H 12 El El CS] ) , only the case when X is a finite set is considered. Even though 
the restriction to the finite case may be sufficient to understand the essence of most 
(but not all) mathematical arguments for proving the capacity theorem, there is no 
reason to restrict ourselves to the finite case from the standpoint that the capacity 
is the maximum reliable transmission rate of all possible communication systems 
for a given quantum channel. Indeed, a particularly important infinite case is when 
X = S(7ii) and If is a trace-preserving CP map. 

Remark 2 The term "classical-quantum channel" has been provided with several 
different meanings in the literature (cf. 1Q\). The present definition is similar to 
that of ^7] , although some measure-theoretic assumptions were made there on both 
the set X and the mapping x ^ W x to consider a channel in a general and unified 
operator-algebraic setting. 

Remark 3 As was pointed out in [TBI, the capacity problem for a channel W : 
X — > S(7i) relies only on its range {W x \x G X}, and we can adopt the alternative 
definition in which an arbitrary subset of S(T~i) is called a channel. In other words, 
we can assume, if we wish, with no loss of generality that every W appearing in the 
sequel is the identity map on a subset X C S(TC). The reason for treating a map 
W instead of its range is mainly that it enables us to introduce more readable and 
natural notations. 

For an arbitrary channel W : X —>■ S(7i), we call a triple (N,(p,Y) a code 
for W when it consists of a natural number (size) N, a mapping (encoding) (p : 
{1, . . . , N} -> X and a POVM (decoding) Y = {Yi}f =1 on H such that £\ Y i < I, 
where I — Yi corresponds to the failure of decoding, and denote the totality of 
such codes by C(W). For a code $ = (N,<p,Y) G <£(W), the code size and the 
average error probability are represented as 



def 



N, and 



(1) 





i=i 
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Now let us proceed to the asymptotic setting. Suppose that we are given a se- 
quence TL = {H^}^ =1 of Hilbert spaces and a sequence W = {W^}™ =1 of channels 
yy{n) . p^{n) _^ \ n important example is the stationary memoryless case 

when Ti and W are defined from a Hilbert space 7i and a channel W : X — > S(7i) 
as H (n) = H® n , X^ = X n and w!$ = W X1 ® ■ ■ ■ ®W Xn for x n = {x u ...,x n ), which 
will be treated in sections and |H1 Except for those sections, however, we do not 
make any assumptions on the mutual relations among {TC^}, {X^} and {W^} 
for different n's. Such an extremely general setting is one of the main features of the 
information spectrum approach. The capacity of W is then defined as 

C(W) = sup {R\3$> = {$ (n) } e €{W), 

liminf - log |$ (n) | > R and lim P e [$ (n) ] = }, (3) 

n— >oo fl n—roo 

where <t(W) denotes the totality of sequences of codes = j-J^ such that 
<= <t(W^) for all n. We also introduce a 'dual' of the capacity 

C\W) = inf {R\W4 = {$ (n) } e £{W), 

liminf - log |$ (n) | > R implies lim P e [$ (n) ] = 1 }. (4) 

Note that C(W) < C'(W) always holds. Following the terminology of classical 
information theory, we say that the strong converse holds for W when C(W) = 

3 Main results 

In this section we give general formulas for C(W) and C^(W) which are regarded 
as the quantum extensions of those for classical channel coding obtained by Verdu 
and Han [H]. The classical formula was given in terms of some information-spectrum- 
theoretic quantities, and we first need to introduce quantum analogues of these con- 
cepts along the line developed in [TUj . 

For a self-adjoint trace-class operator A with the spectral decomposition A = 
'^2 i XiEi, where {Aj} are the eigenvalues and {Ei} are the orthogonal projections 
onto the corresponding eigenspaces, we define 

{a > o} = E i and i A > °> = Ei - ( 5 ) 

i:\i>0 i:\i>0 

These are the orthogonal projections onto the direct sum of eigenspaces correspond- 
ing to nonnegative and positive eigenvalues, respectively. The projections {A < 0} 
and {A < 0} are defined similarly. 
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For any set X, let V(X) be the totality of probability distributions on X with 
finite supports. That is, an element P of V(X) is a function X — > [0, 1] such that its 

support supp(P) = f {x | P(x) > 0} is a finite set and that 



5>< 



a;Gsupp(P) 



Let the totality of sequences P = {P (n) }£° =1 of P^ G P(A'( n )) be denoted by V(X), 
and the totality of <x = {<x (n) }^° =1 of <r<ri G 5(H) by S(H). Given P G V(X) and 
<x G 5(7?), let 



J(P, <r, #) = inf { a 



J_(P, <r, W) = f sup ^ a 

and 



lim V P^(a; n )Tr 

n— »oo ' ^ 



ir 



e n V n) > 



lim V P^(:r n )Tr fwi^ (wjn } - e n V (n) > 



J(P,W) = J(P, 

/(P,^)^ J(P,Wp,W), 

where Wp denotes the sequence {Wl f ^ ) }^ =1 of 

r(n) def 



(6) 



Note that J(P, W) and 1(P, W) are quantum analogues of the spectral sup- and 
inf- information rates ([5]): 

/(X; Y) = p- hm sup -log , 

J(X ; Y) = p- hm mf - log p ^ )(y(n)) , 

where Y = {y <n - ) } is supposed to be the sequence of random variables obtained 
as the outputs of channels W = {W^} for a sequence of input random variables 
X = {iW}. 

Remark 4 The projection — e na a^ > j in the definitions above can be 

replaced with or, more generally, with an arbitrary self-adjoint 

operator S satisfying 

{wj^ - e n V (n) > 0} < S < {wi^ - e n V (n) > o} . 

This ambiguity does not influence the definitions of the above quantities; see [TU] . 
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Now we have the following theorem. 
Theorem 1 



C(W) = max I(P, W) (7) 

Pev(x) 

= max min J_(P, <r, W), (8) 

PeV(X) d-eS(H) 



and 



C\W) = max J(P, W") (9) 

Pev{x) 

= max min J(P,(T,W). (10) 
PeV(X) a-eS{H) 

Remark 5 The formula obtained by Verdu and Han [Sj for a sequence of classical 
channels W = {W in) }™ =1 is 

C(W) = sup/(X; Y), (11) 
x 

where the supremum is taken over all possible input sequences X = {X^ n '}, and Y = 
jyO) j denotes the output sequences corresponding to X. In addition, they showed 
that the strong converse holds for W if and only if sup x /(X ; Y) = sup x /(X ; Y). 
In the process of proving this, they have essentially shown that 

C^W) = sup7(X; Y), (12) 
x 

even though C^(W) does not explicitly appear in that paper. Note that the supre- 
mums in these expressions can be replaced with maximums (see Remark [7| below) , 
and our expressions (J7J) and © are the quantum extensions of (fTTJl and (jl2j) . 

Remark 6 In the classical case, let 

J (X, Y, W ) = p- hm sup — log 



,,-x n Py {n) {Y( n )] 



„ Y ^ w xdrf . ,1, W^(Y^\X^) 

J(X, Y, W ) = p- hm mf — log 



where is an arbitrary random variable with a probability distribution P Y {n) 
taking values in a common set with Y^ n \ Then we have 

J(X, Y, W) > p- hm sup - log 1 ' ] + p- lim inf - log ^ 

= 7(X; Y) + /2(Y||Y), 
J(X, Y, W) > p- hm mf - log j + p . « _ log 

= J(X; Y) + D(Y||Y), 
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where D_(Y || Y) is the spectral inf-divergence rate jE] between Y and Y. Since 
D_(Y || Y) > always holds, we have 

7(X ; Y) = minJ(X, Y, W), and (13) 

Y 

I(X ; Y) = min J(X, Y, W), (14) 

Y 

which yield similar expressions to (JBJ) and (|10|) from (|TT|) and (|12|). In the quan- 
tum case, on the other hand, it is not clear whether the corresponding equations 
I(P,W) = min 9 J(P,a,W) and J(P, W) = min^ J(P, a, W ) generally hold. 
Nevertheless the expressions for C{W) and C^W") in Theorem [T] always hold. 

Remark 7 If a classical or quantum information-spectrum quantity includes a se- 
quence of variables, the supremum (infimum, resp.) (e.g. (JTTJ), (fT2J) ) with respect 
to the variables can always be replaced with the maximum (miminum) due to the 
following lemma. Thus we do not need to care about the attainability of such a 
supremum (infimum). 

Lemma 1 Suppose that we are given a sequence {jF n }^L 1; where each T n is a 
nonempty set consisting of monotonically nondecreasing functions defined on R, and 
let T denote the totality of sequences f = {f n }^Li of functions f n G T n ; in other 
words, T is the direct product Y[^=i °f {-^n}- F° r eac h f G T and let 

[fix = f SU P i a I limsup f n (a) < x} G K U {oo, — oo}, 
[/]+ = f inf {a | hminf/ n (a) > x} G MU {oo, -oo}. 

n^oo 

Then the supremums and infimums of 

sup[/]-, sup[/]+ inf[/]~ and inf [/] + 
Iff f 

are always attainable in T . 

Proof: See Appendix I. ■ 
In the situation of Thorem for instance, the lemma is applied to sequences of 
functions / = {/ n }^L 1 of the form 

f n (a) = J2 p(n) (x n )Tr \w$ {wj" } - e n V (n) < o} 

for which we have [/]„ = J(P,a,W) and [f\f = J(P,a,W). Note that the 
monotonicity of these functions follows from an argument in section 3 of [TO] . 
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4 Lemmas for proving Theorem U 



We need three lemmas. The first one is the key operator inequality to prove the 
second, while the second and third lemmas are directly used to prove the theorem. 
Throughout this paper the generalized inverse of a nonnegative operator A is simply 
denoted by A^ 1 ; i.e., A' 1 is the nonnegative operator such that AA' 1 = A -1 A = 
Pa = Pa- 1 where Pa and Pa- 1 denote the orthogonal projections onto the ranges of 
A and A' 1 . 

Lemma 2 For any positive number c and any operators < S < I and T > 0, we 
have 



I-VS + T SVS + T <(l + c)(I-S) + (2 + c + c- 1 )T. (15) 

Proof: Let P be the orthogonal projection onto the range of S + T. Then P 
commutes both S and T, and hence it is enough to prove 



P 
P 



I - VJTT 1 Sy/S + T l ] P < P [(1 + c) (/ - S) + (2 + c + c" 1 ) T] P, and 



I-VS + T S^STT P ± < P ± [(l + c)(I - S) + (2 + c + c- 1 )T]P ± , 



where P 1 - = I — P. Since P ± S = P L T = P^V S + T = 0, the second inequality is 
trivial. Thus, we have only to show the first one or, equivalently, to show (fTK|) in the 
case when the range of S+T is H. Substituting A = y/T and B = VT(y/JTT -I) 
into the general operator inequality A*B> + B* A < c~ l A* A + cB*B, which follows 
from (A — cB)*(A — cB) > 0, we have 

r(v / 5Tr _1 - /) + (y/sTr' 1 - I)T 

<c- l T + c{y/WTT' 1 ~I)T{y/WTT' 1 -I). (16) 

In addition, since the function f(x) = \fx is an operator monotone function and 
< S < I, we have 

VSTT >y/S>S. (17) 
Now the desired inequality is proved as follows: 

/ - V^Tr^Sy/sTr' 1 = y/sTr^Ty/sTr' 1 

-1 / =-1 =-1 



--T + T(VS + T -I) + (VS + T -I)T+(VS + T -I)T(VS + T -I) 



<(1 + c- 1 ) T + (1 + c) (y/sTT' 1 - J)T(v / 5 + T _1 - /) 

<(i + cr 1 ) t + (i + c) (VsTt' 1 -i)(s + t^VsTt' 1 - I) 

= (1 + cr 1 ) T + (1 + c) (/ + S + T - 2v / 5 + T) 
<(1 + c- 1 ) T + (1 + c) (/ + S + T - 2S) 
= (l + c) (/ - S) + (2 + c + c- 1 )T, 
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where the first inequality follows from ()16p and the third inequality follows from (JT7j) . 



Lemma 3 For any fi 6 i, a 6 K, iV G N, P (n) G P(A' (n) ) and c> 0, i/iere exzsfo a 
code G £(W( n )) suc/i tfmi |$< n >| = N and 



P c [$ (n) ] < (1 + c) ^ ^ (n) (^ n )Tr 



+ (2 + c + c- 1 )e- na iV. 



(18) 



where W^) n) is defined by (G|). 



Proof: We prove the lemma by a random coding method. Given n, a, iV, ?W and 
an encoder cp^ : {1, . . . , JV} -> ^( n \ define the decoding POVM y( n > = {l^ }^ 
by 



(n) def 



N 

E 



7Tj I VTi 



AT 



where 



7T, : 



def 



iw {n) - 



(19) 



(20) 



Denoting the average error probability P e [$ (n) ] of the code = (N,(p^ n \Y^) by 
P e [y ( - n - ) ], we have 



1 ^ 



iV 



- N ^ 

i=l 



(n) 



¥?( n )(i) 



(1 + C) (/ - 7T,) + (2 + C + CT 1 ) ^ VTj 



(21) 



which follows from Lemma 121 Now suppose that an encoder <pW is randomly gener- 
ated according to the probability distribution p£\v {n) ) = P (n) (<p (n) (I)) ■ ■ ■ P (n) ((p {n) (N)) . 
The expectation of P e under P^ is then bounded from above as 



N 



iV 



i=l jV« 



, w {n) 



e na W { ± > 



p(n) 



<1 + c)^pW(x")Tr - e™^!, < o} 



+ (2 + c + c^)iV^P (n) (i"')Tr 



e na <t, > 



(22) 
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Substituting A = W^> - e na W { p( n) into Tr [A {A > 0}] > 0, the second term of ("221 
is further evaluated by 

<e~ na ^P (n) (x n )Tr {wjj^ - e n< W^ > 

<e~ na . 

Thus the existence of ip^ for which the code <J?( n ) = (iV, Y^) satisfies (fTHj) has 
been proved. ■ 

Remark 8 In deriving the direct part of the general capacity formula for classical 
channels, Verdii and Han """] invoked the so-called Feinstein's lemma (Theorem 1 in 
[""] ; see the next remark) which ensures the existence of a code satisfying 

P. < Prob - log 1 J < a + e -"W. (23) 

iV(.)(yw) J 

Lemma El above can be regarded as a quantum analogue of Feinstein's lemma, al- 
though the coefficients there are a bit larger. 

Remark 9 Historically, it seems that Shannon [°"""j was the first to explicitly for- 
mulate the inequality (|23|). He used a random coding argument to prove that there 
exists a code whose average error probability satisfies ()23|). On the other hand, Black- 
well et al. [211] showed that the same inequality is also satisfiable for the maximum 
error probability. They proved this by refining Feinstein's non-random packing argu- 
ment, which is well known to have been used in the first rigorous proof of the coding 
theorem for discrete memoryless channels [21]. This course of things makes some 
people to call the theorem concerning ([2*3*)) "Feinstein's lemma" , sometimes only for 
the maximum error probability and sometimes for both criteria (cf. [5]). We note 
that the original proof of Feinstein does not yield the general capacity formula, and 
the refinement mede by Blackwell et al. is essential in this respect. Our Lemma El 
corresponds to Shannon's one, while an attempt toward a quantum extension of the 
result of Blackwell et al. has been made in |T3*1 IT*"] . The result obtained there is 
unfortunately not general enough to prove the direct part of the general formula ("""[). 
but is of a particular interest itself; see Remark IT""" below. 



and 



Remark 10 Letting A = ^ exM P {n) (x n )Ti wjfi - e^W^ < o} 

B = e~ na N, the RHS of (JTHJ) is minimized at c = yf^, which proves the existence 
of a code satisfying 

P c [$ (n) ] < A + 2B + 2^/B(A + B). 
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Lemma 4 For any n G N and any code G £(W' n ') |$( n )| = 7V ; t/iere exists 
a probability distribution P^ G T^A"^) such that for any a G M and cr^- 1 G S(H^) 



P e [$< n >] > ^ p(n) (^")Tr [wj^ - e n V (n) < Ojj - ^— . (24) 



Proof: Remember that for any operators A > and < T < /, 

Tr [AT] < Tr [A {A > 0}] , 



(25) 



which is the essence of the quantum Neyman- Pearson lemma (HI El El] • Then we see 
that for any code = (JV, <^ (n) , F (n) ), 



Tr 



< Tr 

< Tr 



W 



(n) 



(i) 



^) fil WS (tl -^ w >o 



e n V (n) ] 



,(n) 



r(n) 
tp(- n )(i) 



e na a (n) > q 



y?(™)(i) 1 ^>( n )(i) 



This is rewritten as 



Tr 



(l ~ Y}n ' 



> Tr 



W 



(n) 



WfL^-e^W < 



- e na Tr 



(n)y(n) 



and hence we have 
1 N 



N 



p c [ $(n) ] >^E Tr I <im \ w $ho - e " v(n) ^ } - - E Tr I - (n)r * (n) 



i=l 
iV 



[^i W KS, (0 -e^ (n, <0 



iV 



iV 



i=l 



We thus have (}2*4"j) by letting P^ be the empirical distribution for the N points 
(p(">(l),...,pW(iV)). ■ 

Remark 11 Lemma E] in the case of cr*™- 1 = W^ n) is just the quantum analogue of 
Theorem 4 in [Hj which evaluates the error probability of a code as 



p $(»)] > p ro b J - log 1 - L± j < a I - — . 



(26) 



Our results might seem to be still incomplete in comparison with the beautiful duality 
between (1231) and 
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5 Proof of Theorem U 



Now Theorem ^ is proved as follows. We first show the inequality 

C(W) > max I(P,W). (27) 
PeV(X) 

Here we can assume that the RHS is strictly positive since otherwise the inequality 
is trivial. Suppose that we are given a sequence P = {pM} g V{X) and a number 
R such that < R < I(P, W). Setting N = \e nR ] in Lemma El it follows that for 
each real number a and c > there exists a sequence of codes $ = G £{W) 

such that |$ (n) | = \e nR ] and 

Pc [$(«)] <(i + c) J2 ^ (n) (^ n )Tr - e^W^ < o} 

+ {2 + c + c~ 1 )e- na \e nR ] (28) 

for every n. Recalling the definition of I_(P, W), we see that the first term of the 
RHS goes to as n — > oo for any a < /(-P, W) , while the second term goes to for 
any a > R. Hence, letting a lie in R < a < I_(P, W), the existence of a $ satisfying 
lim inf n ^oo log $ < - n - ) | > R and lim^oo P = is shown. This implies that 
R < C{W) for any < R < I(P, W), and completes the proof of (j2Tj) . 
Next we prove 

C\W)> max I(P,W). (29) 
Pev(x) 

We can assume that C*{W) < oo since otherwise the inequality is trivial. Let R be 
an arbitrary number greater than C'(W). Then for each a and c > there exists a 
sequence of codes $ = {$(")} g such that |$ (n) | = \e nR ] and (EHD holds for 

every n. From lim^oo ± log |$( ft )| = P > C+(W), P e [$ (n) ] must go to 1 as n — > oo, 
and therefore (|28|) yields that for any a > R 

1 < (1 + c) liminf V pW(x n )Tr fw$ } {w$ - e na W%L < o) 



con- 



Since c > is arbitrary E^ 6 *w P (, % n )Tr {wj" } - e no w££) < 

verges to 1 and hence a > /(-P, W"). We thus have a > /(P, W") for Va > Vi? > 
C^W), and (|2U|) has been proved. 

Let us proceed to prove the converse inequality 

C(W) < max min J(P,a,W). (30) 
PeV{X) 9eS(H) 

Let R < C(W). Then there exists a sequence of codes <& = g £(W) satisfying 

liminf-log|$ (n) | > R and lim P e [$ (n) ] = 0. 



13 



From Lemma EJ there exists a P = {P^} such that for any n £ N and <r = {o-W}, 

r r n p nR 

P (n) (x n )Tr [Wg> {wg> - < o}] < P e [$ (n) ] + ^ 

^ as n — > oo. 

This implies that i? < min^ J(jP, <?, W") for some -P. Therefore, we have 

R < max min J(P, (f. W) 

for any R < C(W), and ()3())) has been proved. Similarly, we can prove 

C\W)< max min J(P,a,W). (31) 
PeV{x) aes(fi) 

The remaining parts 

max /(-P, W) > max min J(P, er. W") 

and 

max /(P, W") > max min J(P, <?, W") 

PeP(x) PeP(X) 9eS(H) 

are obvious from the definitions. 

6 Stationary memoryless case 

In this section we demonstrate how the general formulas given in Theorem ^ leads 
to the following coding theorem for stationary memoryless channels. 

Theorem 2 Let W : X — > S(7i) be an arbitrary channel and consider its stationary 
memoryless extension: 

n (n) = n ®n^ X (n) = md 

W$ = W Xl ®---®W tn for x n = (x u ...,x n ). (32) 

Then the capacity ofW = {W {n) } is given by 

C(W) = sup I(P,W), (33) 
PeP(x) 

where 

I(P,W) = ^2p{x)D(W x \\W p ) 

x£X 

withD(p || a) == Tr [p(logp— logcr)] being the quantum relative entropy. Furthermore, 
if dim TC < oo, then the strong converse holds: C\W) = C(W). 
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Remark 12 The proof of the strong converse given below relies essentially on the 
compactness of the closure of the range A = {W x \x G X}, which follows from the 
flniteness of dim7Y. The argument is immediately extended to a certain class of 
channels with dim7Y = oo including the case when X is a finite set, whereas the 
general condition for the strong converse in the infinite-dimensional case is yet to be 
studied. 

Remark 13 Let T be a trace-preserving CP map from the trace-class operators on 
Tii to those on 7i 2 - When considering T as a classical-quantum channel W : X — > 
5(7^2) with X = S(Hi ), its stationary memoryless extension is a channel which 
maps an n-tuple (<Ti, . . . , a n ) G X^ n > = X n of states {oi} C S(Hi) to the product 
state r(o"i) <g> • • • ® r(a n ), and the capacity of W = {W^} is given by fl33J). On the 
other hand, T has the stationary memoryless extension T® n as a "quantum- quantum" 
channel, which defines another classical-quantum channel : X^ n > — > 5(7^2®") 

with = S(H!® n ). Note that W {n) can be regarded as the restriction W {n) \ x{n) of 
W(n) by identifying (a x , . . . , a n ) e X n with a x ® • • • ® a n e X^> . The capacity C(W) 
ofW = {W^} is beyond the scope of the preceding theorem, whereas recently the 

conjecture C{W) = C{W) together with the more fundamental additivity conjecture 
has been calling wide attention. See, for instance, (221 123 12H 123 and the references 
cited there. 

Historically, the converse part C{W) < sup PeV ^ I (P,W) was first established 
by Holevo's early work jSJ 0] which is now often referred to as the Holevo bound, 
while the direct part C{W) > suppg-p^ I(P, W) was proved much more recently by 
Holevo PP and Schumacher- Westmoreland |2j. It should be noted that their proof is 
based on the representation of I(P, W) as the entropy difference: 

I(P, W) = H(W P ) - J2 P(x)H(W x ), (34) 

X 

where H(p) = f — Trfplogp] is the von Neumann entropy, and hence needs (when 
dimH = 00) the assumption 

H(W X ) < 00, \/xe X. (35) 

See the next section for more details. Our proof given below has the advantage of not 
needing this finiteness assumption (cf. Remark [Tljj) . Note also that in the case when 
dim 7i < 00 the range of supremum in can be restricted to those P G V(X) 
with I supp(P)| < dim A+ 1, where | supp(P)| denotes the number of elements of the 

support of P and A = f {W x \x G X}, and that the supremum can be replaced with 
maximum when A is closed (and hence compact); see [T%1 126j. The strong converse 

su Ppe-p(*) I {Pi W) for a finite X was shown in [THl Ej. 
Let us begin with considering the (weak) converse 

C{W) < sup I{P,W). (36) 
pep(x) 
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Lemma 5 For any sequence of channels W = {W^ n '} and any sequence of distri- 
butions P = {?'"'} we have 



L(P,W) < \immf-I(P {n) ,W (n) ) 

n—>oo Ti 

Proof: Given n, x n G X^ n ' and a G M arbitrarily, let 



(37) 



def „ 

a n = Tr 



p\ = Tr 



W%Hwl$-e™W { $ n) >0 



e^W^i, > 



Then the monotonicity of the quantum relative entropy yields 

D(W^ || W%1) > a n log ^ + (1 - a n ) log \^ 

Pn J- Pn 

> - log 2 - a„log/3 n . 



On the other hand, we have 



< Tr 



(n) 



IT 



(») 



e^Wi^ > 



p(n) 



and hence (3 n < e~ na a n < e~ na . We thus obtain ^D(W^ } || Wf^) > -± log2 + aa 
and taking the expectation w.r.t. P'"' we have 



-J(P (n) ,lT (n) ) > --log2 + a VP (n) (x n )Tr 



W, 



x n j X 



e^W^l) > 



This leads to the implications: 



< l(P || W) lim V P (n) (x n )Tr 



IT 



(n) 



w£> - > 



a < liminf-/(P (n) ,VT (n) ), 



which proves the lemma. ■ 
Using this lemma and invoking that in the stationary memoryless case 

sup I(P {n) ,W {n) ) =n sup I(P,W), 

F(")eP(Af«) PeV(X) 

we see that (jHEJ) follows from the general formula C(W) < max^/(P, W). 

Before proceeding to the direct and strong converse parts, we introduce quantum 
analogues of the spectral inf- and sup- divergence rates [H] (see Remark EJ): given 
arbitrary sequences of states p = {p^} and <x = {<7 (n) }, let 



D(p\\ <x) = inf {a | lim Tr [p {n) {p (n) - e n V (n) > 0}] = 0}, 



D{p\\ a) = sup {a | lim Tr [p (n) {p (n) - e n V (n) > 0}] = l}. 



(38) 
(39) 
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Note that D(p\\ a) < D(p\\ a) and that D(p\\ a) < liminf^oo ±-D(p (n) || <r (n) ), the 
latter of which can be proved similarly to Lemma El The following relation, which 
was shown in ^U|, will play an essential role in the later arguments: in the quantum 
i.i.d. case when p = {p® n }™ =1 and <x = {cr m }'£L 1 , we have 



D(p\\a) = D(p\\a) = D(p\\a) 
Now let us observe how the direct part 

C(W) > sup I(P, W) 

PeV(X) 



(40) 



(41) 



follows from the general formula. Let P be an arbitrary distribution in V(X) and 
pin) e v{X n ) be the nth i.i.d. extension: P^(x n ) = P{x x ) ■ ■ ■ P(x n ) for x n = 
(xi, . . . , x n ). Denoting the support of P by {ux, . . . , Uk} C X and letting \ = P(ui), 
Pi = W Ut and a = W P = Y,i \Pi> we nave 



P (n) {x n )Ti 

A in Tr [(jj h 



Pin ) {Ph 



Pin ~ 



>o}] 



l\,...,l n 



Tr [R® n {R® n - e na S® n > 0}] 



where 



R 



dcf 



/Ax 



V 



Pi 



o \ 



/ Ai 



5 



dcf 



a 



\ 



\ 

ho- J 



(42) 



We thus have for the sequences P = {P {n) }, R = {R® n } and S = {S® n } 
/(P, W) = D(R || S) = D(R || S) = I(P, W), 



(43) 



where the second equality follows from (|4T)|) and the rest are immediate from the 
definitions of the quantities. This, combined with (JJJ), completes the proof of (|4~Tj) . 

Remark 14 Essential in the above derivation of (j41|) from (j7|) is the use of D_(p || ct) > 
£>(p || a) for sequences of i.i.d. states. The proof of the inequality given in [TUj is based 
on the direct part of the quantum Stein's lemma for a hypothesis testing problem 
on p® n and a® n , which was first shown by Hiai and Petz ^Tj, whereas the classi- 
cal counterpart of the inequality is a direct consequence of the weak law of large 
numbers. Hence the above derivation can be thought of as a proof of the channel 
coding theorem via the theory of quantum hypothesis testing (cf. Remark ITB1 below) . 
It should be noted that a significant characteristic of the proof lies in separation of 
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the coding part and the limiting part; the former is entirely coped with in the gen- 
eral formula (|7|). or equivalently in the non-asymptotic arguments of Lemma 121 and 
Lemma El while the latter relies on the asymptotic analysis of quantum hypothesis 
testing. Another proof of (J41j) with a similar approach is found in 114j. where 
the coding part is proved by a variant of quantum Feinstein's lemma (cf. Remark [SJ) 
and the limiting part is based on an asymptotic analysis made in [27] (cf. Remark IT71 
below) on a variant of D_(p\\ <x), which is much easier to treat than the original 

R{p\\a). 

Remark 15 In an actual fact, (|4"Tj) can be proved by directly applying Lemma 121 
to the direct part of quantum Stein's lemma as follows, without appealing to the 
general formula (J7J. Given P G V(X), let R and S be defined by (|42jl . which can 
be represented as R = @ x P(x)W x and S = ® x P(x)Wp. For an arbitrary e > and 
a sufficiently large n, it follows from the quantum Stein's lemma that there exists a 
projection of the form = ® x nT^\ where {T^} are projections on H <g ' n , such 
that 

Tr [R® n T^\ = ^P^(x n )Tr [W$t!$] > 1 - e, 

Tr [S® n T {n) ] = ^P (n) (s n )Tr \Wf n r£ ) ] < e -^D{R\\s)-e) 

Given an encoder </? (n) : {1,...,JV} -> X n , define the decoding POVM = 
K W }f=iby 

(N \ ~2 / JV \ ~2 

V^ T (n) \ T (n) / V^ T (n) \ 
Z-e i fP (»)(j) y \W(<) ^Z-^ v (n) (i) y 

Then replacing y w with Z^ n ^ in the proof of Lemma E3 using Lemma 121 for c = 1 
(e.g.) and applying the random coding with respect to p( n \ we see that there exists 
a code $^ n ^ satisfying 

P e [$W] < 2 (l - Tr [^ n T (n) ]) + 4iVTr [^ n T {n) ] (44) 
< 2£ + 4e- n(D(fl||s) - £) AT, 

which proves JUJ by || S) = I(P, W). 

Remark 16 As is shown in section 4 of [H], from the fact that the (direct part of) 
quantum Stein's lemma holds for states on every finite-dimensional matrix algebra, 
it is immediately concluded that the lemma holds also for states on every AFD 
(approximately finite dimensional) operator algebra, including the algebra B{Ti) of 
bounded operators on a separable Hilbert space Ti. This means that our proof of 
is valid for every channel W on a separable Hilbert space Ti without the finiteness 
assumption (|H5|) . Note also that a similar argument based on the AFD property can 
be applied to the channel coding problem directly to remove the finiteness assumption 
from the proof of Holevo- Schumacher- Westmoreland. 
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Remark 17 Combination of the argument in Remark ITol and the derivation of the 
direct part of quantum Stein's lemma given in [27] will provide one of the simplest 
proofs of ()41j) (for a finite-dimensional TC). In addition, application of Theorem 2 
of j27J to (|44j) implies that for any n and a > there exists a code $^ n ^ satisfying 



|$(™)| 



and 



(45) 



where d = k dim 7i (the size of the matrices R and S) and 



dcf 



max [ —at — log Tr 

o<t<i v 



max —at — log > A;Tr 
0<i<l \ 6 ^ 



As was shown in [27], (p{a) > holds for any a < D(R\\ S) = I(P, W), and (gSJ) 
gives an exponential bound on the error probability. 



Next we proceed to the strong converse part 

C ] {W) < sup I(P,W) (46) 

under the assumption that 7i is finite-dimensional. In order to link ()4*5j) to the 
general formula, we use the following relations ([2EH2H]) : 

sup I(P, W) = sup min J(P, a, W) 
PEV(X) PeP(X) °£S(H) 

= min sup J(P, a, W) 

aGS(H) p<z-p(X) 

= min sup D(W X \\ a), (47) 

aeS(H) x( z X 

where 

J(P, a, W) = J2 P(x)D(W x || a). 

These relations can be derived just in parallel with its classical counter part (e.g., 
pp. 142-147 of [30 , Theorem 4.5.1 of [31]) by the use of a mini-max theorem for a 
certain class of two- variable convex-concave functions (e.g. Chap. VI of [H2]), com- 
bined with the fact that the supremum of swp PeV , x \ I(P, W) can be attained when 
{W x \ x e X} is closed (HSIEBD- 

In proving the strong converse of the quantum hypothesis testing problem for two 
i.i.d. states, which is equivalent to the part D(p \\ <r) < D(p \\ a) in (J4T)|) (see [TU|). 
Ogawa and Nagaoka [12] showed that for any states p, a and any numbers c > and 
< s < 1, 

Tr [p {p-ca> 0}] < c~ s Tr [p 1+ V^] . (48) 
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Applying this to the states W x n\ a® n and c = e na , we have 



Tr 



< exp 

< exp 



- n { as - -E lo s Tr [ w £ s °~ s ] 



i=l 



-n(as- sup log Tr [W^ +S a- S ] 



(49) 



Now assume that Ima D lmW x for all x G X, where Im denotes the image (range) 
of an operator, let A be the closure of the range A = {W x \x G X}, and define the 
function / : [0, 1] x A -»• R by f(s,p) = logTr [p 1+s a~ s ]. Then we have f(0,p) = 
and 



d_ 

ds 



f(0,p)=D(p\\a). 



(50) 



Moreover, since the derivative 



d_ Tr [p 1+s (log p- log a)a- s ] 

ds I[S,P) Tr \p^a~ s ] 

is continuous with respect to both s and p, and since A is compact, we see that the 
differentiation in ()50j) is uniform in p; i.e., 



lim max 

40 P eA 



f{',p) 



D{p\\a) 



0. 



Let a be an arbitrary number satisfying a > m&x pe & D(p\\ a) = sup x( z X D(W X \\ a). 
It then follows from the above uniform convergence that there exists an so > such 
that for any < s < Sq 

as > max/(s,p) = sup log Tr [W^+V -5 ] . 
peA x< z X 

Invoking ()49j). this implies that for any sequence x = {x n } G X, where X = {X n } is 
identified with the product set Y[ n we have 



lim Tr 

n^oo 



Wt ] \ Win ] - e™a® n > 



for Va > sup D(W X || a), (51) 



or equivalently 



D(W S || 3) < sup D(W x \\a) 



(52) 



where W$ = and a = {cr®™}. Although we assumed Ima D ImW x , Vx G X 

above, this inequality is valid for any o G S(TC) because sup xeX D(W X || a) — oo if 
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Imcx 7$ ImWj; for some x (z X . Now the desired inequality ()46|) is derived from the 
general formula (fTU|) as follows: 

C\W) = max min J(P, a, W) 

Pef(x) aeS(H) 

< min max J(P, <?■ W) 

= min max D(W^ \\ er) 

&<=S(H) xex 

< min max D(W S II a) with a = \a m \ 

< min sup D(W X \\ a) = sup I(P, W), 

aes{n) x& x PeP(x) 

where the last equality follows from (|47jl . 



7 On the Hole vo- Schumacher- Westmoreland de- 
coder 

Let us return to the situation in the proof of LemmaElwhere a probability distribution 
and an encoder ^ : {1, . . . , N} -> X^ are given. Instead of Y^ 1 ' defined in 
ifHty . consider the following POVM ?( n ); 

tJV \ ~\ / N \ ~5 

J>,tJ Tu<r ^TUjrj , (53) 

where 

r d = f {<i<e--}, ^{W^ {i) >e-}. 

This type of decoder was introduced by Holevo pQ and Schumacher- Westmoreland 
[2] in proving the direct part of the capacity theorem. Let us investigate this decoder, 
comparing it with our F. (n) defined by <Pj| and (|2Djl . 

Remark 18 More precisely, the decoder treated in QjJ |2] was defined by (J53J) with 
projections r and z/j of the form 

r = {e- b ' < < e- 6 } , ^ ^ { e - < ^ < e-'} , 

where we have used a slight extension of the notation in (J5|): 

{a<A</3}= ^ 

i:a<Ai</3 

However, the asymptotic performance of the decoder does not depend on the partic- 
ular values of b', d as far as b' is sufficiently large and d is sufficiently small. Hence 
we set b' = oo and d = — oo to simplify the arguments. 
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The authors of |2] showed by a rather complicated calculation that the average 
error probability of the code $^ n - ) = (N, (p^ n \ Y^) satisfies 



N 



Tr 



i=l 



(54) 



Note that a simplified derivation of the inequality with slightly different coefficients 
was shown in ^H]- Applying the random coding with respect to P^ n ' to (J54j) and 
noting that 



Tr 


*' p(n) ^ 


'* p(n) 


< 


rr p(n) 


\w (n ) 


< e 


-n{b-c) 





-rib 
—nb 



Tr 



p(n) 



-nb 



wg> > e~ nc 



(55) 



where || • || denotes the operator norm, we see that there exists a code such that 
P e [$<">] <3Tr [W p n l {wgi, > e-" b }] + ^P^(a;")Tr 

+ e - n(fe - c) iV. (56) 
Now, for an arbitrary P = {P^} e V{X) let 



dcf 



H(Wp) = sup 1 6 
= sup < 6 



lim Tr 

71— +00 

lim Tr 

n^oo 



TJ/(n) ( w (n) > n6 



I P) = inf <! c 



inf < c 



lim V P (n) (x n )Tr 

n— >oo ' ^ 

lim V P (n) (x n )Tr 

n. — *-ryi f ^ 



W, 



(n) 



— logWj^ > C 
n 



and assume that //"(W" | P) < oo. It then follows from that there exists 
a sequence of codes $ = such that lim^oo Pef^^- 1 ] = with the rate 

liminf^oo - log |$( n )| being arbitrarily close to H(Wp) — H(W | P); i.e., we have 



C{W) > max i H(Wp) - H(W | P) 

PeV(X) 



H(W\P) < ook 



(57) 



The quantities H(Wp] and H(W \ P) are regarded as information-spectrum 
analogues of the von Neumann entropy and its conditional version. Indeed, for a 
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stationary memoryless channel W^} = W Xl ® ■ ■ • <E> W Xn with i.i.d. P^ n \x r 
P(xi) ■ ■ ■ P(x n ) the law of large numbers yields 



H(Wt 



H{W F 



-Tr [WplogWp] 



H(W | P) = H(W | P) = P(x)H{W, 



(58) 
(59) 



which leads to C(W) > swp PeP{x) {H(W P ) - H(W | P)) = sup PePm I(P, W) under 



£T>(X) 



the fmiteness assumption (J35j) (cf. Remark ITH|). This is just what was shown in Q3I2!- 

Remark 19 Inequality (JT5j) of LemmaElcan be applied to the code = (N, (p( n \ Y^) 
to derive (|57jl more straightforwardly than the derivations in [IJI21IIE]. Indeed, letting 
c = 1 (e.g.) in (|T5Jl we have 



1 N [ 

P.[* W ] < W £ 2Tr [*$><*) (/ - w)] + 4 £ Tr [ 

i=i L jv« 



iV 



< 



N 



i=l 



11' 



(n) 



+ 2Tr 



<S» W (^-^)l+4E Tr 



where the second inequality follows from the next lemma. This leads to (J57)) as well 
as from ()54p. 

Lemma 6 For any state p and any projections u, r such that [p, u] = 0, we have 

Tr [prur] > Tr [pu] - 2Tr [p(I - r)]. 

Proof: Obvious from < (/ — t)u(I — t) = tut — v + (I — t)v + uil — r) and 
pv = up < p. ■ 
Comparing (JH7|) with (J2J) it is immediate that 



max/(P, W) > max <^ H{Wp) - H(W | P) 



H(W\P) < oo 



Actually, a slightly stronger assertion holds: 

Theorem 3 For every P e P(ri) with H(W \ P) < oo we have 

I(P, W) > H(W P ) - H(W | P). 



(60) 



Proof: It suffices to show that for any b < H_(Wp), c > H(W \ P) and e > we 
have I_(P, W) > b — c — e, or, equivalently that if 



lim Tr 

n— >oo 



1¥ 



(») 



W { ± > e~ nb 



p(n) ] " * p(n) 

23 







(61) 



and 



then 



Jim p(n) (^ n )Tr [W$ > e" nc }] = 1 



lim ^ (n) (^")Tr \w$> {w^ - e^-^W^ > 



(62) 



(63) 



We obtain 

> P {n \x n )Tr 

a;n g ^(n) 

> ^ (n) (^ n )Tr [W^jwi" 5 >e" nc }] -2 ^ P^(x n )Tr 



> 



n) > e -nc 



2Tr 



U/(n) ImW > -nb 

VV p(n) \ VV p(n) ^ ^ 



where the second inequality follows from ()25j). the third from Lemma El and the last 
from (|55|). Now it is clear that and (JBTj) implies (JH3J). ■ 

Remark 20 Theorem El enables us to derive the direct part ()41jl for a stationary 
memoryless channel from the general formula (J7J) via equations ()58j) and ([59)1 . This 
is essentially equivalent to the simplification of Holevo-Schumacher- Westmoreland's 
proof explained in Remark ITUl but can also be regarded as a variation of the scenario 
of section El to derive (jUj) from (j2J) via £>(p|| <?) > °") f° r P = {p^ n }^=i an d 

<x = {cr (X,n }^ =1 . That is, just in parallel with the proof of Theorem El we can show 
for any sequences of states p = {p (n) } and <x = {a^} that 



D(p\\*)>K(p\\<T)-H(p), 



where 



K(p\\a) = sup lb lim Tr p (n) (-- loga (n) < b\ = oj 



77(p) = inf { c 



lim Tr 



1 



n 



logp (n) > 
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which yields for p = {p® n }™ =1 and <x = {a m }™ =1 that D(p\\ a) > -Tr [p log tr] - 
H(p) = D(p || a). Combination of this argument, which provides another simple 
proof of the direct part of the quantum Stein's lemma (cf. PHI), with the scenario of 
section is equivalent to the direct use of Theorem |H] mentioned above. 

Remark 21 The classical counterpart of ()60|) is rather obvious (cf. Remark HJ): 
/(X;Y) = p-.i.ni g f-log 

> p- lim inf — log — . , — p- lim sup — log 



n-oo n b P y( „,(FW) * n^oo n t> W { - n ){Y^)\X^)) 
H(Y)-H(Y\X). 



8 Capacity under cost constraint 

The cost constraint problem in the general setting is trivial as in the case of clas- 
sical information spectrum methods jSJ. Namely, given a sequence W = {W^} 
of channels : X^ n ' — > S(1~L^) as well as a sequence c = {c^} of func- 

tions c- n > : X^ — > K, which are called cost functions, and a real number 7, 
the capacity under cost constraint is nothing but the capacity C(W |^ 7 ) of the 
sequence of channels W fc, 7 = t c (»o 7 }, where \ c ( n ) 7 is the restriction 

W( n H c (n) i7 : AT^J 7 9i"h Wjn } of the original channel W< n > to 

<*S, 7 = e * w I c ( "V) < nry }• (64) 

In addition, the strong converse property in this case is represented as C{W\s ri ) = 
C^(W\s n ). Needless to say, we can apply the general formulas in Theorem [T] to these 
quantities. 

Now let us consider the situation where W = {W^} is the stationary memory less 
extension (JH2*|) of W : X — > S(7~t) and c— {c^ 1 '} is the additive extension 

n 

c W(x n ) = 5^c(x i ) J 
1=1 

where c is a function Af — > R. We shall prove the following theorem, which was 
essentially obtained by Holevo ^21 E3] except for the strong converse part. 

Theorem 4 In £/ie stationary memoryless case with the additive cost, we have 

C(W\^) = sup I(P,W), (65) 

where 

V C>1 {X) ¥ {P e V{X) I E P [c] d = f £ P(a;)c(a;) < 7}. 
If, in addition, dim7i < 00 then the strong converse holds: C'(W\s, 7 ) = C{W\s^). 
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We first show that the (weak) converse part 

C(W\ SjJ )< sup I(P,W) (66) 

Pev c ,-,(X) 

is derived from the general formula. Let == V(X^ ) be the totality of proba- 
bility distributions on X^ = X n whose supports are finite subsets of 



n 

i=l 



For any P^ £ and any permutation 7r on {1, . . . , n}, P^ defined by Pi (xx, . . . , x n ) 
= P( n )(x^ {1) , . . . , x <n) ) also belongs to V {n) and satisfies I(P( n \ = I(P^ n \ W^). 

since is concave with respect to ?W, we can restrict ourselves to 

symmetric distributions when considering sup P („) g p(„) I(P^ n \ W^). For a sym- 
metric 

p(n) e p(n) ) the 

marginal distribution on X belongs to P Cj7 and satisfies 
/(?("», W (n) ) < nI(P, W). Hence we have 

sup I(p( n \W {n) ) < n sup I(P,W), 

p(n) 6 -p(n) PeV c ,j 

and ()66j) follows from Lemma El and (|IJ) as in the costless case. 
Next, let us consider the direct part 

C{W\ Sa )> sup I(P,W). (67) 

We use a slight modification of Lemma El as follows. Let P be a probability dis- 
tribution in V cn {X) and a be a real number. Given an arbitrary encoder (p( n > : 
{1, . . . , N} -> i", let the decoder F (n) = {Yi (n) , . . . , Y^ n) } be defined by 

(N \ ~l /AT \ ~3 

where 7Tj = f jw 7 ^)^ — e na Wp® n > oj. It then follows from Lemma El for c = 1 

(e.g.) that the average error probability of the code <3>( n ) = (N, f^ n \ Y^) is bounded 
by 

9 N A N 

P .[»w] <|2> K&<„('-*<)] +^EE^ KV>. • 

i=l i=l j'^i 

Now let P^ be the nth i.i.d. extension of P and £ V(X^) be defined by 

pV>(x n ) = pV>(x n )/K n for G *< n > = >7 , (68) 
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where K n d = P^ n \X^). Note that due to the assumption P e V ca (X) and to the 
central limiting theorem we have 



lim K n > lim PW ( X^ _ . 
Generating the encoder tfft 1 ' randomly according to the distribution 

we see that there exists a code $( n ) for W^fcW.-y °f s i ze N satisfying 
P c < 2 ^ P (n) (a; n )Tr 



(69) 



W!$ \win ] - e na W P ® n < 



+ 4iV ^ P»(x n )Tr[( ^ P {n \x n, )w!$) {w£> - e na W P m > 



c n/gp(>(n) 



X) P (n) (x n )Tr fe ) {^ ) -e n V<0} 



< ^- J2 P {n \x n )Tr [W xn {W x n - e na W P m < 0}] + ^Ne-™. 



Thus, letting P = {P^} and recalling ()69|) we have 

C(W\ S>J )>L(P,W) = I(P,W), 
where the last equality follows from (|43|) . We have thus proved JB7J). 

Remark 22 For the sequence p = {p(«)} defined from a P G V ca {X) by 
general formula (J7J) implies that 



(70) 



the 



C(W\s, 7 )>I{P, W) 



= sup{a | lim V P (n) (x n )Tr 

n— >oo ' ^ 

> sup {a I lim V P (n) (x n )Tr 



c n G ^>(n) 



? {^» - e"^, < 0}] = 0} 

}]=0 } 



= J(P,a,W) for P={P^} and & = {wgh} , 

where the second inequality follows from (jSS)). So, if we could use I(P,W) = 
min^ J(.P, <r, W), which is merely a conjecture at present (see Remark EJ), the in- 
equality in (fTOj) could be derived from the general formula as in the classical case. 



27 



Let us proceed to the proof of the strong converse part 



C\W\ Sn )< sup I(P,W) 

Pev c ,-,(x) 



(71) 



under the assumption that dim7i < oo. We claim that for any x G X = {X^}, 
where X^ = xjjft , and any a G S(H), 



D(W £ \\a)< sup J(P,a,W), 

Pev crt (x) 



(72) 



where W$ = {W^} and <r = {<7® n }. We only need to show this for a such that 
Im a D ImlFj. for Vr G supp(P), VP G V C ^(X), since the RHS is oo otherwise. For 
any x n G X^ and any real numbers a and < s < 1, it follows from (j48J) that 



Tr 



where 



Let 



< 



exp 



n \ as 



1 



n 



^logTr [T^+V- 



i=l 



< exp [— n (as — 4>(s))} 



(73) 



def 



sup V P(x) logTr [W^+V -8 ] . 



P q , 2 W = {P|Pg7> (*) and |supp(P)| < 2}, 



where |supp(P)| denotes the number of elements of the support of P. Then a similar 
argument to section IV of [18J is applied to prove that V C ^(X) is the convex hull of 
^,7,2 (^) ! see Appendix II. Hence we have 



sup V P(x) logTr [Wl +s o- s ] . 



max g{s, uj) 



where Q is the compact subset of [0, 1] x S(H) 2 defined as the closure of 

n = {(A, W X1 ,W X2 ) | < A < 1, (an, x 2 ) G Af 2 , Ac(xO + (1 - A)c(x 2 ) < 7} 



and 



(s, (A, Pl ,p 2 )) d = f AlogTr [p{ +s <T- s ] + (1 - A) logTr [^ + V~ S ]. 
A similar argument to the derivation of (J51)) is applied to (|73|) so that we have 



lim Tr 

n—*oo 



{ Win } - e n V® n > 







5 



for Va > max — g(c<j, 0) = sup J(P, cr, W), 
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which proves the claim (J72J). Now the strong converse (fTTj) is derived as follows: 

= max min_ J(P, <r, W) 

PeV(x) 9&s{il) 

< min max J(P, (?■ W) 

= min max D(Wg || <x) 

<reS(H) Se % 

< min max D(W a II o) with <x = \a® n } 

< min sup J(P, a, W) 

aeS(H) p e v cri (x) 

= sup min J(P, a, W) — sup I(P, W), 

Pev cri (x) ^6<S(W) Pep c , 7 (A") 

where we have invoked the fact that similar relations to (|47jl hold for the present 
situation. 

9 Concluding remarks 

We have obtained a general formula for capacity of classical-quantum channels to- 
gether with a characterization of the strong converse property by extending the 
information-spectrum method to the quantum setting. The general results have 
been applied to stationary memoryless case with or without cost-constraint on in- 
puts, whereby new simple proofs have been given to the corresponding coding the- 
orems. Among many open problems concerning the present work, we would recall 
here only the following two; one is the problem mentioned in Remark El and the other 
is how to analyze (if possible) asymptotics of the quantum information spectrum di- 
rectly, not by way of the theory of quantum hypothesis testing. These problems will 
be important toward further developement of the quantum information-spectrum 
method. 
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Appendix I Proof of Lemma [T] 
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Let us begin with the attainability of c == sup^[f\ x . We assume — oo < c < oo 
first. Then for every natural number ft there exists = {/n^}^Li G T such that 
[f {h) ]x > c - \- This implies that 

lim sup ( c- i J < x, 

and hence there exists n fc such that for any n > n^, 



4"(c-i)<^ 



A:' 



Let us choose {n k } to satisfy n k < (Vft). Then every n uniquely determines 

a number ft such that n k < n < n k +i, which we denote by k = k n . Letting /* = f 
fi kn) G T n and f* = {f*}n=i e ^, we have 



k / k 



This implies that limsup^^ f*(c — e) < x for any e > 0, and therefore we have 
[/*]" = c = sup^[/]~. Next, let us consider the case when c = oo. Then for 

every natural number ft there exists = {fn^}^Li G JF such that [f^]~ > ft, 
which implies the existence of a number n k such that for any n > n k we have 



fn\k) < x + |. Then a similar argument to the previous one is applicable to 
construction of a sequence /* = {/^} G T satisfying limsup n _^ 00 /^(ft) < £ for any 
ft, and therefore we have [f*]^ = 00 = sup^*[/]~. The remaining case c = —00 is 
trivial, since this means that [/]~ = —00 for all / G T . 

Let us proceed to the attainability of c == supy [/]+,. Assume —00 < c < 00. 

Then for every ft there exists f k) = {fi k) }™ =1 G T such that + > c - \. This 
implies that 

liminf ( c - \ ) < x, 

n-*oo y ft y 

and hence there exists a 5 fe > such that the set 



A k = { n 



has infinitely many elements. Let {-B^j^Lx be a family of subsets B k C A k such that 
|S fc | = 00 and 5 fc n ^ = for Vft ^ V/, and let /* = {f*}™ =1 be defined by 



n 



if neB k , 
an arbitrary element of T n if n G" |J fc B k . 
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Then for every k the set {n | /* (c — |) < a; — 5k} includes E>k as a subset and hence 
has infinitely many elements. This leads to liminfn^oo f*(c — e) < x for any e > 0, 
and therefore we have [/*]£ = c = sup^[/j+. The case c = oo can be proved 
similarly, and the case c = — oo is trivial. 

Letting Q n be the set of monotonically nondecreasing functions g n (a) = f —f n (—a) 
for /„ 6 f n , we have 

i nf _ [/I J = - sup [0] 1 a and inf [f}+ = - SU p[g]Z x . 
The attainability of the infimums thus follows from that of the supremums. 

Appendix II Proof that V crj (X) is the convex hull of V cri ^X) 

Let P be an arbitrary distribution in V C ^(X), and let Tip denote the subset of 
V C<1 {X) consisting of all distributions P' satisfying Pp'[c] = Ep[c] and supp(P') C 
supp(P). Since Tip is convex and compact, the element P of Tip can be represented 
as a convex combination of extreme points of Tip. Hence it suffices to show that 
the support of every extreme point of Tl P has at most two elements. Suppose that 
a P' E Tip is written as P' = X)i=i^i&r»> where {x\, . . . , Xk} = supp(P') and 
\ = f P'(xi) > 0. If k > 3, there exists a nonzero real vector (eti, . . . , a k ) G R k 
such that Yli=i a i = an d Yli=i a i c ( x i) — 0- Then, for a sufficiently small e > 0, 
Pi = Yli=i(^~\~ £a i) an d P2 = X!i=i(^ — ^» become two distinct distributions 
in 7£p and satisfy P' = \{P\ + P2), which means that P' is not extreme. Therefore, 
if P' is an extreme point then k = \ supp(P')| < 2. 

References 

[1] A.S. Holevo, "The capacity of the quantum channel with general signal states," 
IEEE Trans. Inform. Theory, vol.44, 269-273, 1998. 

[2] B. Schumacher and M.D. Westmoreland, "Sending classical information via 
noisy quantum channels," Phys. Rev. A, vol.56, 131-138, 1997. 

[3] A.S. Holevo, "Bounds for the quantity of information transmitted by a quantum 
communication channel," Probl. Inform. Transm., vol.9, 177-183, 1973. 

[4] A.S. Holevo, "On the capacity of quantum communication channel," Probl. In- 
form. Transm., vol. 15, no. 4, pp. 247-253, 1979. 

[5] S. Verdii and T.S. Han, "A general formula for channel capacity," IEEE Trans. 
Inform. Theory, vol.40, 1147-1157, 1994. 



31 



[6] T.S. Han, Information- Spectrum Methods in Information Theory, Springer- 
Verlag, 2003. (The original Japanese edition was published from Baifukan-Press, 
Tokyo, in 1998.) 

[7] A. Winter, "Coding theorem and strong converse for quantum channels," IEEE 
Trans. Inform. Theory, vol.45, 2481-2485, 1999. 

[8] A. S. Holevo, "An analog of the theory of statistical decisions in noncommuta- 
tive theory of probability," Trudy Moskov. Mat. Obsc., vol. 26, 133-149 1972. 
(English translation is Trans. Moscow Math. Soc, vol. 26, 133-149 1972.) 

[9] C.W. Helstrom, Quantum Detection and Estimation Theory, Academic Press, 
New York, 1976. 

[10] H. Nagaoka and M. Hayashi, "An information-spectrum approach to classical 
and quantum hypothesis testing," LANL e-print |quant-ph /0206 185, 2002. 

[11] F. Hiai and D. Petz, "The proper formula for relative entropy and its asymptotics 
in quantum probability," Commun. Math. Phys., vol. 143, 99-114, 1991. 

[12] T. Ogawa and H. Nagaoka, "Strong converse and Stein's lemma in quantum hy- 
pothesi s testing," IEEE Trans. Inform. Theory, vol.46, 2428-2433, 2000. LANL 
e-print [quan^h/9906090, 1999. 

[13] T. Ogawa, "A study on the asymptotic property of the hypothesis testing and 
the channel coding in quantum mechanical systems," Ph.D. dissertation, Uni- 
versity of Electro-Communications, 2000 (In Japanese). 

[14] T. Ogawa and H. Nagaoka, "A New Proof of the Channel Coding Theorem 
via Hypothesis Testing in Quantum Information Theory," Proc. 2002 IEEE 
International Symposium on Information Theory, p. 73. 2002. 

[15] T. Ogawa and H. Nagaoka, "Strong Converse to the Quantum Channel Coding 
Theorem," IEEE Trans. Inform. Theory, vol.45, 2486-2489, 1999. 

[16] A.S. Holevo, "Coding theorems for quantum channels," LANL e-print 
|quant-ph/9809023[ 1998. 

[17] A.S. Holevo, "Problems in the mathematical theory of quantum communication 
channels," Rep. Math. Phys., vol.12, no.2, pp.273-278, 1977. 

[18] A. Fujiwara and H. Nagaoka, "Operational capacity and pseudoclassicality of a 
quantum channel," IEEE Trans. Inform. Theory, vol.44, 1071-1086, 1998. 

[19] C.E.Shannon, "Certain results in coding theory for noisy channels," Information 
and Control vol.1, 6-25, 1957. 



32 



[20] D. Blackwell, L. Breiman and A.J. Thomasian, "The capacity of a class of 
channels," Ann. Math. Statist, vol.30, 1229-1241, 1959. 

[21] A. Feinstein, "A new basic theorem of information theory," IRE Trans. PGIT, 
vol.4, 2-22, 1954. 

[22] S. Osawa and H. Nagaoka, "Numerical experiments on the capacity of quan- 
tum channel with entangled input states," IEICE Trans., vol.E84-A, 2583-2590, 
2001. 

[23] P.W. Shor, "Additivity of the classical capacity of entanglement-breaking quan- 
tum channels," LANL e-print |quant-ph/0201149| 2002. 

[24] C. King, "Additivity for a class of unital qubit channels," LANL eprint 
quant-ph/0103156, 2001 (Jour. Math. Phys., in press). 

[25] C. King, "The capacity of the quantum depolarizing channel," LANL eprint 
|quant-ph/0204172[ 2002. 



[26] A. Uhlmann, "Entropy and Optimal Decompositions of States Relative to a 
Maximal Commutative Subalgebra," Open Systems & Information Dynamics, 
vol.5, 209-228, 1998. 

[27] T. Ogawa and M. Hayashi, "On error exponents in quantum hypothesis testing," 



LANL e-print quant-ph/0206151, 2002. 



[28] M. Ohya, D. Petz and N. Watanabe, "On capacities of quantum channels," Prb. 
Math. Stat, vol.17, 179-196, 1997. 

[29] B. Schumacher and M.D. Westmoreland, "Optimal signal ensembles," Phys. 
Rev. A, vol 63, no.2, 022308, Jan. 2001. 

[30] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete 
Memoryless Systems, Academic Press, 1981. 

[31] R. G. Gallager, Information Theory and Reliable Communication, John Wiley 
& Sons, 1968. 

[32] I. Ekeland and R. Teman, Convex Analysys and Variational Problems, North- 
Holland, 1976; SIAM, 1999. 

[33] A. S. Holevo, "On quantum communication channels with constrained inputs," 



LANL e-print quant-ph/9705054, 1997. 



33 



